Purpose

Recent studies have found that the bladder is not sterile and that specific bacteria, fungi, and viruses that make up the urobiome are associated with lower urinary tract symptoms (LUTS) in women. However, similar studies have not been undertaken in men, even though one-third of men over the age of 50 experience moderate to severe LUTS. To fill this gap, we will evaluate the urobiome and its relationship to LUTS in a large cohort of men from the Osteoporotic Fractures in Men (MrOS) study.

Samples 500 morning, second voided urine specimens that were collected in sterile containers and frozen as part of the MrOS study were obtained with associated clinical information. Bacterial DNA was extracted and the 16S rRNA amplicon was sequenced with Illumina MiSeq. Sequences were processed with DADA2 and analyzed in R. LUTS severity was classified with American Urologic Association Symptom Index ( 0 -7 none/mild; 8-35 moderate/severe) and is defined below:

LUTS categories -

According to AUA (american urology assoc), scores for each symptom go from 0 –> 5; 0 = never, 5 = almost always

  1. Over the past month, how often have you had the feeling of not completely emptying your bladder after you finished urinating? PSEMPTY
  2. Over the past month, how often have you had to urinate again less than 2 hours after you finished urinating? PSAGAIN
  3. Over the past month, how often have you found that you stopped and started again several times when you urinated? PSSTOP
  4. Over the past month, how often have you found it hard to hold your urine? PSPOST
  5. Over the past month, how often have you had a weak urine stream? PSWEAK
  6. Over the past month, how often have you had to push or strain to begin urination? PSPUSH
  7. Over the past month, have you had to get up to urinate during the night? Give a score to the number of times. PSUP

Found here: https://www.cigna.com/individuals-families/health-wellness/hw/medical-topics/american-urological-association-symptom-index-ug1952

Luts breakdown

  • LUTS: 1 - no LUTS, 2- Moderate LUTS 3- Severe LUTS
  • PSSCORE: 0-7 = no/mild LUTS, 8-19 = moderate LUTS, >20 = severe LUTS

Data origin

This ps object was cleaned up in Box/Kate/MrOS/data/Analyses/MrOS_EDA.Rmd. ALl of the data is described explicitly in that RMD. The ps object is rarefied

Set-up

library(phyloseq)
library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(patchwork)
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(RColorBrewer)
library(speedyseq)
## 
## Attaching package: 'speedyseq'
## 
## The following objects are masked from 'package:phyloseq':
## 
##     filter_taxa, plot_bar, plot_heatmap, plot_tree, psmelt, tax_glom,
##     tip_glom, transform_sample_counts
library(here)
## here() starts at /Users/katebowie/Library/CloudStorage/OneDrive-YaleUniversity/MrOS/OHSU
library(microshades)
library(qwraps2)
## 
## Attaching package: 'qwraps2'
## 
## The following object is masked from 'package:ggplot2':
## 
##     mean_se
library(reshape2)
## 
## Attaching package: 'reshape2'
## 
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(tableone)
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.6-8
library(cowplot)
## 
## Attaching package: 'cowplot'
## 
## The following object is masked from 'package:patchwork':
## 
##     align_plots
## 
## The following object is masked from 'package:lubridate':
## 
##     stamp
library(ggbeeswarm)
library(ggpubr)
## 
## Attaching package: 'ggpubr'
## 
## The following object is masked from 'package:cowplot':
## 
##     get_legend
## 
## The following objects are masked from 'package:qwraps2':
## 
##     mean_ci, mean_sd, median_iqr
library(rstatix)
## 
## Attaching package: 'rstatix'
## 
## The following object is masked from 'package:stats':
## 
##     filter

Load-in data

load(here("data", "Final_Analyses", "ps_cleaned.RData"))

Cohort description

adjust categorical variables

Need to change categorical variables from 1 = Yes; 0 = No; into Yes or no for Table One to work properly.

# BPH history
meta_ps$BPH <- 'Yes'
meta_ps$BPH[meta_ps$PSBPH == 0] <- 'No'
meta_ps$BPH[is.na(meta_ps$PSBPH)] <- 'No'

# Prostatitis history
meta_ps$Prostatitis <- 'Yes'
meta_ps$Prostatitis[meta_ps$MHPROST == 0] <- 'No'
meta_ps$Prostatitis[is.na(meta_ps$MHPROST)] <- 'NA'

# Medical history of prostate cancer
# The variable 'MHPC'is for 'medical history of prostate cancer'. Currently the column has three values: NaN, 0, 1. NaN means no history of any cancer, while '0' means history of cancer (*not prostate*), and '1' is history of prostate cancer. To make this easier for later analyses, I need to create a new column that has more understandable variables.
meta_ps$Cancer <- 'Yes'
meta_ps$Cancer[is.na(meta_ps$MHPC)] <- 'No'

meta_ps$Cancer_Type <- 'Prostate'
meta_ps$Cancer_Type[meta_ps$MHPC == 0] <- 'Other'
meta_ps$Cancer_Type[is.na(meta_ps$MHPC)] <- 'None'

# urinary satisfaction score
meta_ps$QoL <- 'Happy'
meta_ps$QoL[meta_ps$PSQL == 3] <- 'Mixed'
meta_ps$QoL[meta_ps$PSQL > 3] <- 'Unhappy'

factorizing LUTS cat

For LUTS to show up in the order of no/mild then mod/severe, must be factored

meta_ps$LUTS_cat <- factor(meta_ps$LUTS_cat, levels = c("no/mild", "moderate/severe"))
sample_data(ps) <- meta_ps

Table 1. Clinical characteristics

Using table one (https://cran.r-project.org/web/packages/tableone/vignettes/introduction.html) ** take out Q of L**

# filtering the metadata for only the clinical characteristics needed to establish if covariates are an issue 
clin_char <- meta_ps %>%
  select(LUTS_cat, age, Race, HWBMI.x, Diabetes, Prostatitis, BPH, Cancer, Cancer_Type, QoL) 

clin_table <- CreateTableOne(strata = "LUTS_cat", data = clin_char, factorVars = "Race")
print(clin_table)
##                                 Stratified by LUTS_cat
##                                  no/mild       moderate/severe p      test
##   n                                265           210                      
##   LUTS_cat = moderate/severe (%)     0 ( 0.0)    210 (100.0)   <0.001     
##   age (mean (SD))                72.39 (5.24)  73.70 (6.12)     0.013     
##   Race = 2. Other (%)               34 (12.8)     21 ( 10.0)    0.416     
##   HWBMI.x (mean (SD))            27.85 (3.63)  27.99 (4.08)     0.693     
##   Diabetes (%)                                                  0.544     
##                                     19 ( 7.2)     21 ( 10.0)              
##      0. No                         207 (78.1)    159 ( 75.7)              
##      1. Yes                         39 (14.7)     30 ( 14.3)              
##   Prostatitis = Yes (%)             47 (17.7)     58 ( 27.6)    0.014     
##   BPH = Yes (%)                    101 (38.1)    127 ( 60.5)   <0.001     
##   Cancer = Yes (%)                  82 (30.9)     72 ( 34.3)    0.500     
##   Cancer_Type (%)                                               0.736     
##      None                          183 (69.1)    138 ( 65.7)              
##      Other                          47 (17.7)     42 ( 20.0)              
##      Prostate                       35 (13.2)     30 ( 14.3)              
##   QoL (%)                                                      <0.001     
##      Happy                         241 (90.9)     87 ( 41.4)              
##      Mixed                          20 ( 7.5)     68 ( 32.4)              
##      Unhappy                         4 ( 1.5)     55 ( 26.2)
# if want to show all levels of each variable, do 
# print(clin_table, showAllLevels = TRUE)

There is a statistically significant difference between the two groups when it comes to age, prostatitis, BPH, and the various quality of life categories. This probably means I should include them in my models later on.

clean-up

rm(clin_char, clin_table)

Overall sequencing results

How many were sequenced pos/neg

ps
## phyloseq-class experiment-level object
## otu_table()   OTU Table:          [ 729 taxa and 475 samples ]:
## sample_data() Sample Data:        [ 475 samples by 36 sample variables ]:
## tax_table()   Taxonomy Table:     [ 729 taxa by 7 taxonomic ranks ]:
## phy_tree()    Phylogenetic Tree:  [ 729 tips and 728 internal nodes ]:
## taxa are rows

Number of reads per sample and distribution

ggplot(meta_ps, aes(x = SampleSums)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

median(meta_ps$SampleSums)
## [1] 14558

The majority of samples have less than 10,000 reads. There are a few samples that have over 50,000 reads, however there is a question if those men had an undiagnosed UTI at the moment, in which their microbial load would be more than the other men. The range of reads is between 1000-100,000. The median number of reads per sample is 14,558.

Sample reads by group

meta_ps %>% 
  group_by(LUTS_cat) %>% 
  summarise(max_reads = max(SampleSums), min_reads = min(SampleSums), avg_reads = mean(SampleSums), sd_reads = sd(SampleSums))
LUTS_cat max_reads min_reads avg_reads sd_reads
no/mild 101017 1018 16368.71 14190.39
moderate/severe 81167 1184 19367.10 15395.35
shapiro_test(meta_ps$SampleSums)
variable statistic p.value
meta_ps$SampleSums 0.8822832 0
# non-normal

wilcox.test(meta_ps$SampleSums ~ meta_ps$LUTS_cat)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$SampleSums by meta_ps$LUTS_cat
## W = 24696, p-value = 0.03523
## alternative hypothesis: true location shift is not equal to 0

Number of phyla/class/order/family/genera

taxa_counts <- ps@tax_table %>% data.frame()

taxa_counts <- lapply(taxa_counts, unique)
taxa_counts <- lapply(taxa_counts, function(x) x[!is.na(x)]) # removing all NAs from each level

print(lengths(taxa_counts))
## Kingdom  Phylum   Class   Order  Family   Genus Species 
##       2      29      66     100     212     571       0
rm(taxa_counts) # removing because we don't really need this extra list in our environment

When examining the entire cohort of 475 men, they had a total of 2 Kingdoms (Bacteria and Archaea), 29 different Phylum, 66 classes, 100 orders, and 571 different genera. I know that in each of those categories, one of them maps to ‘NA’, so when reporting those numbers, I will report the number for each -1 to account for this.

Stacked Bar plot - Phylum

mdf_prep <- prep_mdf(ps, subgroup_level = "Phylum")
## Warning in speedyseq::psmelt(.): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
color_decontam <- create_color_dfs(mdf_prep, group_level = "Phylum")

# Extract
mdf <- color_decontam$mdf
cdf <- color_decontam$cdf
plot_microshades(mdf, cdf, group_label = "Phylum", x = "Sample") +
  theme(axis.text.x = element_blank()) +
    scale_y_continuous(labels = scales::percent, expand = expansion(0)) +
      theme(legend.position = "bottom", legend.title = element_blank(), legend.text = element_text(size = 8), legend.key.height = unit(0.2, "cm"), legend.key.width = unit(0.2, "cm")) +
  xlab("Sample Name") +
  ylab("Relative Abundance")  +
  theme(axis.title.x = element_text(size = 10), axis.title.y = element_text(size = 10))

# reorder_samples_by will change the order of samples based on an abundance of a specified subgroup taxonomy
firm_order <- reorder_samples_by(mdf, cdf, order = "Firmicutes",  sink_abundant_groups = FALSE)

mdf <- firm_order$mdf
cdf <- firm_order$cdf

#firm_plot <- 
plot_microshades(mdf, cdf, group_label = "Phylum Family") + 
  facet_wrap(~LUTS_cat, scales = "free_x") +
  theme(axis.text.x = element_blank()) +
    scale_y_continuous(labels = scales::percent, expand = expansion(0)) +
      theme(strip.text = element_text(size=14), legend.position = "bottom", legend.title = element_blank(), legend.text = element_text(size = 10), legend.key.height = unit(0.5, "cm"), legend.key.width = unit(0.2, "cm")) +
  xlab("Sample") +
  ylab("Relative Abundance")

Stacked Bar plot - Order

mdf_prep <- prep_mdf(ps, subgroup_level = "Order")
## Warning in speedyseq::psmelt(.): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
color_decontam <- create_color_dfs(mdf_prep, group_level = "Phylum", subgroup_level = "Order")

# Extract
mdf <- color_decontam$mdf
cdf <- color_decontam$cdf
# reorder_samples_by will change the order of samples based on an abundance of a specified subgroup taxonomy
firm_order <- reorder_samples_by(mdf, cdf, order = "Firmicutes", group_level = "Phylum", subgroup_level = "Order", sink_abundant_groups = FALSE)

#firm_legend <-custom_legend(firm_order$mdf, firm_order$cdf)

plot_microshades(firm_order$mdf, firm_order$cdf, group_label = "Phylum-Order") + theme(axis.text.x = element_blank()) +
    scale_y_continuous(labels = scales::percent, expand = expansion(0)) +
      theme(legend.position = "bottom", legend.title = element_blank(), legend.text = element_text(size = 8), legend.key.height = unit(0.2, "cm"), legend.key.width = unit(0.2, "cm")) +
  xlab("Sample") +
  ylab("Relative Abundance")

#plot_grid(firm_plot, firm_legend,  rel_widths = c(1, .25))

Stacked Bar plot - Genus

ps_g <- tax_glom(ps, "Genus")
ps_g <- subset_taxa(ps_g, taxa_sums(ps_g) > 0)

mdf_prep <- prep_mdf(ps_g, subgroup_level = "Genus")
## Warning in speedyseq::psmelt(.): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
color_decontam <- create_color_dfs(mdf_prep, group_level = "Phylum", subgroup_level = "Genus")

# Extract
mdf <- color_decontam$mdf
cdf <- color_decontam$cdf
# reorder_samples_by will change the order of samples based on an abundance of a specified subgroup taxonomy
firm_order <- reorder_samples_by(mdf, cdf, order = "Firmicutes", group_level = "Phylum", subgroup_level = "Genus", sink_abundant_groups = FALSE)

#firm_legend <-custom_legend(firm_order$mdf, firm_order$cdf)

plot_microshades(firm_order$mdf, firm_order$cdf, group_label = "Phylum-Genus") + theme(axis.text.x = element_blank()) +
    scale_y_continuous(labels = scales::percent, expand = expansion(0)) +
      theme(legend.title = element_blank(), legend.text = element_text(size = 12), legend.key.height = unit(0.2, "cm"), legend.key.width = unit(0.2, "cm")) +
  xlab("Sample") +
  ylab("Relative Abundance")

#plot_grid(firm_plot, firm_legend,  rel_widths = c(1, .25))

Overall Alpha diversity

Calculating overall alpha diversity

ps_g <- tax_glom(ps, "Genus")
ps_g <- subset_taxa(ps_g, taxa_sums(ps_g) > 0)

# adding pielou's evenness to the rest of the standard alpha div measures to our meta_ps object
meta_ps <- meta_ps %>% 
  mutate(estimate_richness(ps_g, measures = c("Observed", "Shannon", "InvSimpson"))) %>% 
  mutate(microbiome::evenness(ps_g, "pielou"))


## plotting alpha div
meta_ps_long <- melt(meta_ps) %>% filter(variable %in% c("Shannon", "Observed", "pielou", "InvSimpson"))
## Using ID, Sample, sample_type, SITE, marital_status, Race, Education, current_health, Diabetes, LUTS, LUTS_cat, BPH, Prostatitis, Cancer, Cancer_Type, QoL as id variables
ggplot(meta_ps_long, aes(x = variable, y = value)) + geom_boxplot() + facet_wrap(~variable, scales = "free", nrow = 1)
## Warning: Removed 3 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

I summarize the alpha diversity metrics in the next code chunk, so I won’t say much here.

summary statistics

### remove NAs from pielou by replacing with a zero
meta_ps$pielou[is.na(meta_ps$pielou)] <- 0

meta_ps %>% #group_by(LUTS_cat) %>% 
  summarise(mean_obs = mean(Observed), mean_shan = mean(Shannon), mean_inv = mean(InvSimpson), mean_pie = mean(pielou), min_obs = min(Observed), max_obs = max(Observed), min_shan = min(Shannon), max_shan = max(Shannon), min_inv  = min(InvSimpson), max_inv  = max(InvSimpson), min_pie = min(pielou), max_pie = max(pielou), median_shan = median(Shannon), median_inv = median(InvSimpson), median_pie = median(pielou))
mean_obs mean_shan mean_inv mean_pie min_obs max_obs min_shan max_shan min_inv max_inv min_pie max_pie median_shan median_inv median_pie
23.66316 1.994173 5.677122 0.6414344 1 69 0 3.736307 1 28.17961 0 0.9286552 2.109964 5.128962 0.6674291

Beta diversity

Using an adonis plus the covariates from table one above, to see if any of the covariates are associated with the microbiomes beta diversity measures ### creating distances

dist_wu_ps <- phyloseq::distance(ps_g, method = "wunifrac")
dist_u_ps <- phyloseq::distance(ps_g, method = "unifrac")
dist_b_ps <- phyloseq::distance(ps_g, method = "bray")

weighted unifrac

# Age-adjusted
adonis2(dist_wu_ps~ age, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.2594351 0.0050718 2.411183 0.019
Residual 473 50.8931988 0.9949282 NA NA
Total 474 51.1526339 1.0000000 NA NA
# BMI-adjusted (although BMI was not sig. different from table1)
adonis2(dist_wu_ps~ HWBMI.x, data = meta_ps, permutations = 10000)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.4924886 0.0096278 4.598232 3e-04
Residual 473 50.6601453 0.9903722 NA NA
Total 474 51.1526339 1.0000000 NA NA
# Adjusted for prostatitis
adonis2(dist_wu_ps~ Prostatitis, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.1298055 0.0025376 1.203344 0.262
Residual 473 51.0228283 0.9974624 NA NA
Total 474 51.1526339 1.0000000 NA NA
# adjusted for BPH
adonis2(dist_wu_ps~ BPH, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.0746944 0.0014602 0.6916969 0.736
Residual 473 51.0779395 0.9985398 NA NA
Total 474 51.1526339 1.0000000 NA NA
# adjusted for Diabetes
adonis2(dist_wu_ps~ Diabetes, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 2 0.1913842 0.0037414 0.8862946 0.594
Residual 472 50.9612496 0.9962586 NA NA
Total 474 51.1526339 1.0000000 NA NA
# adjusted for Race
adonis2(dist_wu_ps~ Race, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.097022 0.0018967 0.8988512 0.48
Residual 473 51.055612 0.9981033 NA NA
Total 474 51.152634 1.0000000 NA NA
# adjusted for Cancer
adonis2(dist_wu_ps~ Cancer, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.0827249 0.0016172 0.7661825 0.659
Residual 473 51.0699090 0.9983828 NA NA
Total 474 51.1526339 1.0000000 NA NA
# adjusted for Cancer Type
adonis2(dist_wu_ps~ Cancer_Type, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 2 0.2277782 0.0044529 1.055588 0.361
Residual 472 50.9248557 0.9955471 NA NA
Total 474 51.1526339 1.0000000 NA NA
# adjusted for LUTS
adonis2(dist_wu_ps~ LUTS_cat, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.1304846 0.0025509 1.209656 0.254
Residual 473 51.0221492 0.9974491 NA NA
Total 474 51.1526339 1.0000000 NA NA

covariate order test

This is just a small test to see if order of covariates matter (for future use). I have found out that order of covariates DOES matter. Even when setting seeds it matters.

Age and BMI are significantly associated with microbiome composition as measured weighted UNIFRAC distance, but Prostatitis, Diabetes, and BPH are not.

UNweighted unifrac

# Age-adjusted
adonis2(dist_u_ps~ age, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.3796739 0.0040043 1.901665 0.012
Residual 473 94.4360790 0.9959957 NA NA
Total 474 94.8157528 1.0000000 NA NA
# BMI-adjusted (although BMI was not sig. different from table1)
adonis2(dist_u_ps~ HWBMI.x, data = meta_ps, permutations = 10000)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.5267817 0.0055558 2.642597 0.0010999
Residual 473 94.2889711 0.9944442 NA NA
Total 474 94.8157528 1.0000000 NA NA
# Adjusted for prostatitis
adonis2(dist_u_ps~ Prostatitis, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.1042536 0.0010995 0.5206545 0.984
Residual 473 94.7114992 0.9989005 NA NA
Total 474 94.8157528 1.0000000 NA NA
# adjusted for BPH
adonis2(dist_u_ps~ BPH, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.2209254 0.00233 1.104687 0.31
Residual 473 94.5948275 0.99767 NA NA
Total 474 94.8157528 1.00000 NA NA
# adjusted for Diabetes
adonis2(dist_u_ps~ Diabetes, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 2 0.2961699 0.0031236 0.7394879 0.904
Residual 472 94.5195830 0.9968764 NA NA
Total 474 94.8157528 1.0000000 NA NA
# adjusted for Race
adonis2(dist_u_ps~ Race, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.2233928 0.0023561 1.117054 0.295
Residual 473 94.5923600 0.9976439 NA NA
Total 474 94.8157528 1.0000000 NA NA
# adjusted for Cancer
adonis2(dist_u_ps~ Cancer, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.1706101 0.0017994 0.8526437 0.655
Residual 473 94.6451427 0.9982006 NA NA
Total 474 94.8157528 1.0000000 NA NA
# adjusted for Cancer Type
adonis2(dist_u_ps~ Cancer_Type, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 2 0.3413503 0.0036001 0.8527036 0.702
Residual 472 94.4744026 0.9963999 NA NA
Total 474 94.8157528 1.0000000 NA NA
# adjusted for LUTS
adonis2(dist_u_ps~ LUTS_cat, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.1907699 0.002012 0.9535977 0.481
Residual 473 94.6249829 0.997988 NA NA
Total 474 94.8157528 1.000000 NA NA

Age and BMI are significantly associated with microbiome composition as measured unweighted UNIFRAC distance, but Prostatitis, Diabetes, and BPH are not.

Bray-Curtis

# Age-adjusted
adonis2(dist_b_ps~ age, data = meta_ps, permutations = 10000)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.9272392 0.0054545 2.594117 0.0025997
Residual 473 169.0687253 0.9945455 NA NA
Total 474 169.9959645 1.0000000 NA NA
# BMI-adjusted (although BMI was not sig. different from table1)
adonis2(dist_b_ps~ HWBMI.x, data = meta_ps, permutations = 10000)
Df SumOfSqs R2 F Pr(>F)
Model 1 1.493583 0.008786 4.192611 1e-04
Residual 473 168.502381 0.991214 NA NA
Total 474 169.995965 1.000000 NA NA
# Adjusted for prostatitis
adonis2(dist_b_ps~ Prostatitis, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.4419633 0.0025998 1.232932 0.213
Residual 473 169.5540013 0.9974002 NA NA
Total 474 169.9959645 1.0000000 NA NA
# adjusted for BPH
adonis2(dist_b_ps~ BPH, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.3102106 0.0018248 0.8647138 0.586
Residual 473 169.6857539 0.9981752 NA NA
Total 474 169.9959645 1.0000000 NA NA
# adjusted for Diabetes
adonis2(dist_b_ps~ Diabetes, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 2 0.5068785 0.0029817 0.7057877 0.883
Residual 472 169.4890861 0.9970183 NA NA
Total 474 169.9959645 1.0000000 NA NA
# adjusted for Race
adonis2(dist_b_ps~ Race, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.3064525 0.0018027 0.8542192 0.623
Residual 473 169.6895120 0.9981973 NA NA
Total 474 169.9959645 1.0000000 NA NA
# adjusted for Cancer
adonis2(dist_b_ps~ Cancer, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.5732719 0.0033723 1.60048 0.066
Residual 473 169.4226926 0.9966277 NA NA
Total 474 169.9959645 1.0000000 NA NA
# adjusted for Cancer Type
adonis2(dist_b_ps~ Cancer_Type, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 2 1.106231 0.0065074 1.545804 0.03
Residual 472 168.889734 0.9934926 NA NA
Total 474 169.995965 1.0000000 NA NA
#adjusted for LUTS
adonis2(dist_b_ps~ LUTS_cat, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.4919595 0.0028939 1.37281 0.136
Residual 473 169.5040050 0.9971061 NA NA
Total 474 169.9959645 1.0000000 NA NA
beta div plots
# adding ordination plot
ps_ord <- ordinate(ps_g, method = "NMDS", distance = "bray")
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.2487638 
## Run 1 stress 0.2504327 
## Run 2 stress 0.2519704 
## Run 3 stress 0.2497209 
## Run 4 stress 0.2515976 
## Run 5 stress 0.2489159 
## ... Procrustes: rmse 0.01934007  max resid 0.261702 
## Run 6 stress 0.2506757 
## Run 7 stress 0.2520593 
## Run 8 stress 0.2538555 
## Run 9 stress 0.2505582 
## Run 10 stress 0.250031 
## Run 11 stress 0.25077 
## Run 12 stress 0.2524964 
## Run 13 stress 0.2720698 
## Run 14 stress 0.2510881 
## Run 15 stress 0.2505918 
## Run 16 stress 0.2506133 
## Run 17 stress 0.2510039 
## Run 18 stress 0.2506378 
## Run 19 stress 0.2758826 
## Run 20 stress 0.2522014 
## *** Best solution was not repeated -- monoMDS stopping criteria:
##     18: no. of iterations >= maxit
##      2: stress ratio > sratmax
ord_plot <- plot_ordination(ps_g, ps_ord,  color = "Cancer_Type") +
    geom_point(size = 2) +
  #coord_fixed(xlim = c(-1.8, 1.8), ylim = c(-1.8,1.8)) +
  ggtitle(paste("NMDS, bray-curtis"))
  
plot(ord_plot)

adonis2(dist_b_ps~ age + HWBMI.x + Race + Prostatitis + BPH + Diabetes , data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 7 4.101679 0.0241281 1.649487 0.001
Residual 467 165.894286 0.9758719 NA NA
Total 474 169.995965 1.0000000 NA NA

Top bacteria

Phyla

options(scipen=999)

# transform to relative abundance
ps_n <- transform_sample_counts(ps,function(x) 100* x/sum(x))

ps_n_phy <- tax_glom(ps_n, "Phylum")

ps_melt <- psmelt(ps_n_phy)
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
summary_phylum <- ps_melt %>%
        group_by(Phylum) %>%
        summarise(rank_abundance = sum(Abundance), mean_abundance = mean(Abundance), median_abundance = median(Abundance), min_abundance = min(Abundance), max_abundance = max(Abundance)) %>%
        arrange(desc(rank_abundance)) %>%
        mutate(order = row_number()) %>%
        ungroup()

#mutate(mean_abundance = round(mean_abundance, digits = 1), min_abundance = round(min_abundance, digits = 1), max_abundance = round(max_abundance, digits = 1))

plotting

summary_phylum[1:5,] %>% 
  ggplot(., aes(x = Phylum, y = median_abundance)) + geom_boxplot()

Genera

ps_n_genus <- tax_glom(ps_n, "Genus")

ps_melt <- psmelt(ps_n_genus)
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
summary_genus <- ps_melt %>%
        group_by(Genus) %>%
        summarise(rank_abundance = sum(Abundance), mean_abundance = mean(Abundance), median_abundance = median(Abundance), min_abundance = min(Abundance), max_abundance = max(Abundance)) %>%
        arrange(desc(rank_abundance)) %>%
        mutate(order = row_number()) %>%
        ungroup()

#ps_10 <- summary_genus[1:10,]
#ps_10$LUTS <- "all"

Microbiome relationships to cohort characteristics

test for normality

shapiro.test(meta_ps$Observed)
## 
##  Shapiro-Wilk normality test
## 
## data:  meta_ps$Observed
## W = 0.97201, p-value = 0.00000006935
shapiro.test(meta_ps$Shannon)
## 
##  Shapiro-Wilk normality test
## 
## data:  meta_ps$Shannon
## W = 0.94732, p-value = 0.000000000005859
shapiro.test(meta_ps$InvSimpson)
## 
##  Shapiro-Wilk normality test
## 
## data:  meta_ps$InvSimpson
## W = 0.90914, p-value = 0.000000000000000296
shapiro.test(meta_ps$pielou)
## 
##  Shapiro-Wilk normality test
## 
## data:  meta_ps$pielou
## W = 0.91781, p-value = 0.000000000000002071
shapiro.test(meta_ps$age)
## 
##  Shapiro-Wilk normality test
## 
## data:  meta_ps$age
## W = 0.95133, p-value = 0.00000000002141
shapiro.test(meta_ps$HWBMI.x)
## 
##  Shapiro-Wilk normality test
## 
## data:  meta_ps$HWBMI.x
## W = 0.97598, p-value = 0.0000004793

Turns out all these variables are non-normal.

Age (65-74, 75-79, >=80); check numbers

ggplot(meta_ps, aes(x = age)) + geom_histogram(stat = "count")
## Warning in geom_histogram(stat = "count"): Ignoring unknown parameters:
## `binwidth`, `bins`, and `pad`

Splitting age into tertiles

tertiles <- quantile(meta_ps$age, c(0:3/3))# classify values
meta_ps$age_group = with(meta_ps, 
               cut(age, 
                   tertiles, 
                   include.lowest = T, 
                   labels = c("age1", "age2", "age3")))

table(meta_ps$age_group)
## 
## age1 age2 age3 
##  193  134  148
ggplot(meta_ps, aes(x = age_group, y = age)) + geom_boxplot()

#sample_data(ps) <- meta_ps

Age - groups alpha diversity

kruskal.test(meta_ps$Observed ~ meta_ps$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  meta_ps$Observed by meta_ps$age_group
## Kruskal-Wallis chi-squared = 3.229, df = 2, p-value = 0.199
kruskal.test(meta_ps$Shannon ~ meta_ps$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  meta_ps$Shannon by meta_ps$age_group
## Kruskal-Wallis chi-squared = 2.6293, df = 2, p-value = 0.2686
kruskal.test(meta_ps$InvSimpson ~ meta_ps$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  meta_ps$InvSimpson by meta_ps$age_group
## Kruskal-Wallis chi-squared = 2.1821, df = 2, p-value = 0.3359
kruskal.test(meta_ps$pielou ~ meta_ps$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  meta_ps$pielou by meta_ps$age_group
## Kruskal-Wallis chi-squared = 2.1665, df = 2, p-value = 0.3385

Age - continuous with alpha diversity

# testing Observed
ggplot(meta_ps, aes(x = as.numeric(age), y = Observed)) + geom_point() + stat_smooth(method = "lm", col = "red")
## `geom_smooth()` using formula = 'y ~ x'

res <- cor.test(as.numeric(meta_ps$age), meta_ps$Observed, 
                    method = "spearman") #pearson is for normal, spearman is for non-normal
## Warning in cor.test.default(as.numeric(meta_ps$age), meta_ps$Observed, method =
## "spearman"): Cannot compute exact p-value with ties
res$p.value
## [1] 0.1718558
# testing Shannon
ggplot(meta_ps, aes(x = as.numeric(age), y = Shannon)) + geom_point() + stat_smooth(method = "lm", col = "red")
## `geom_smooth()` using formula = 'y ~ x'

res <- cor.test(as.numeric(meta_ps$age), meta_ps$Shannon, 
                    method = "spearman") #pearson is for normal, spearman is for non-normal
## Warning in cor.test.default(as.numeric(meta_ps$age), meta_ps$Shannon, method =
## "spearman"): Cannot compute exact p-value with ties
res$p.value
## [1] 0.2189915
# testing InvSimpson
ggplot(meta_ps, aes(x = as.numeric(age), y = InvSimpson)) + geom_point() +
  stat_smooth(method = "lm", col = "red")
## `geom_smooth()` using formula = 'y ~ x'

res <- cor.test(as.numeric(meta_ps$age), meta_ps$InvSimpson, 
                    method = "spearman") #pearson is for normal, spearman is for non-normal
## Warning in cor.test.default(as.numeric(meta_ps$age), meta_ps$InvSimpson, :
## Cannot compute exact p-value with ties
res$p.value
## [1] 0.2377173
# Testing pielou
ggplot(meta_ps, aes(x = as.numeric(age), y = pielou)) + geom_point() + stat_smooth(method = "lm", col = "red")
## `geom_smooth()` using formula = 'y ~ x'

res <- cor.test(as.numeric(meta_ps$age), meta_ps$pielou, 
                    method = "spearman") #pearson is for normal, spearman is for non-normal
## Warning in cor.test.default(as.numeric(meta_ps$age), meta_ps$pielou, method =
## "spearman"): Cannot compute exact p-value with ties
res$p.value
## [1] 0.1882539
# test age, all alpha div measures for normality firsts

Age groups - beta diversity

adonis2(dist_wu_ps~ age_group, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 2 0.3756496 0.0073437 1.745935 0.039
Residual 472 50.7769842 0.9926563 NA NA
Total 474 51.1526339 1.0000000 NA NA
adonis2(dist_u_ps~ age_group, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 2 0.6016895 0.0063459 1.507192 0.029
Residual 472 94.2140633 0.9936541 NA NA
Total 474 94.8157528 1.0000000 NA NA
adonis2(dist_b_ps~ age_group, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 2 1.163632 0.0068451 1.626567 0.022
Residual 472 168.832333 0.9931549 NA NA
Total 474 169.995965 1.0000000 NA NA

Age - with top Phyla

# Firmicutes
ps_melt <- psmelt(ps_n_phy) %>% filter(Phylum == "Firmicutes")
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
shapiro.test(ps_melt$Abundance)
## 
##  Shapiro-Wilk normality test
## 
## data:  ps_melt$Abundance
## W = 0.96116, p-value = 0.0000000007126
ggplot(ps_melt, aes(x = age, y = Abundance)) + geom_point() + stat_smooth(method = "lm", col = "red")
## `geom_smooth()` using formula = 'y ~ x'

res <- cor.test(ps_melt$age, ps_melt$Abundance, 
                    method = "spearman") #pearson is for normal, spearman is for non-normal
## Warning in cor.test.default(ps_melt$age, ps_melt$Abundance, method =
## "spearman"): Cannot compute exact p-value with ties
res$p.value
## [1] 0.7419283
# Proteobacteria
ps_melt <- psmelt(ps_n_phy) %>% filter(Phylum == "Proteobacteria")
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
shapiro.test(ps_melt$Abundance)
## 
##  Shapiro-Wilk normality test
## 
## data:  ps_melt$Abundance
## W = 0.84118, p-value < 0.00000000000000022
ggplot(ps_melt, aes(x = age, y = Abundance)) + geom_point() + stat_smooth(method = "lm", col = "red")
## `geom_smooth()` using formula = 'y ~ x'

res <- cor.test(ps_melt$age, ps_melt$Abundance, 
                    method = "spearman") #pearson is for normal, spearman is for non-normal
## Warning in cor.test.default(ps_melt$age, ps_melt$Abundance, method =
## "spearman"): Cannot compute exact p-value with ties
res$p.value
## [1] 0.3939888
# Actinobacteria
ps_melt <- psmelt(ps_n_phy) %>% filter(Phylum == "Actinobacteria")
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
shapiro.test(ps_melt$Abundance)
## 
##  Shapiro-Wilk normality test
## 
## data:  ps_melt$Abundance
## W = 0.80902, p-value < 0.00000000000000022
ggplot(ps_melt, aes(x = age, y = Abundance)) + geom_point() + stat_smooth(method = "lm", col = "red")
## `geom_smooth()` using formula = 'y ~ x'

res <- cor.test(ps_melt$age, ps_melt$Abundance, 
                    method = "spearman") #pearson is for normal, spearman is for non-normal
## Warning in cor.test.default(ps_melt$age, ps_melt$Abundance, method =
## "spearman"): Cannot compute exact p-value with ties
res$p.value
## [1] 0.07209791
# Bacteroidetes
ps_melt <- psmelt(ps_n_phy) %>% filter(Phylum == "Bacteroidetes")
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
shapiro.test(ps_melt$Abundance)
## 
##  Shapiro-Wilk normality test
## 
## data:  ps_melt$Abundance
## W = 0.77261, p-value < 0.00000000000000022
ggplot(ps_melt, aes(x = age, y = Abundance)) + geom_point() + stat_smooth(method = "lm", col = "red")
## `geom_smooth()` using formula = 'y ~ x'

res <- cor.test(ps_melt$age, ps_melt$Abundance, 
                    method = "spearman") #pearson is for normal, spearman is for non-normal
## Warning in cor.test.default(ps_melt$age, ps_melt$Abundance, method =
## "spearman"): Cannot compute exact p-value with ties
res$p.value
## [1] 0.1918584

Enrollment site

alpha diversity

plot_richness(ps, x  = "SITE", color = "SITE", scales = "free_y", measures=c("Observed", "Shannon", "InvSimpson")) +  geom_boxplot()

BMI (<=25 kg/m2, 25-29.9 kg/m2, >30 kg/m2); check numbers

# creating bmi_groups
meta_ps$BMI_group <- 'healthy_weight'
meta_ps$BMI_group[meta_ps$HWBMI.x >= 25] <- 'overweight'
meta_ps$BMI_group[meta_ps$HWBMI.x >= 30] <- 'obesity'

table(meta_ps$BMI_group)
## 
## healthy_weight        obesity     overweight 
##            104            118            253
ggplot(meta_ps, aes(x = HWBMI.x)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

sample_data(ps_g) <- meta_ps
sample_data(ps_n) <- meta_ps
sample_data(ps) <- meta_ps

BMI Adonis

adonis2(dist_wu_ps~ BMI_group, data = meta_ps, permutations = 10000)
Df SumOfSqs R2 F Pr(>F)
Model 2 0.4641226 0.0090733 2.160902 0.0083992
Residual 472 50.6885113 0.9909267 NA NA
Total 474 51.1526339 1.0000000 NA NA
adonis2(dist_u_ps~ BMI_group, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 2 0.5775524 0.0060913 1.44636 0.032
Residual 472 94.2382005 0.9939087 NA NA
Total 474 94.8157528 1.0000000 NA NA
adonis2(dist_b_ps~ BMI_group, data = meta_ps, permutations = 10000)
Df SumOfSqs R2 F Pr(>F)
Model 2 1.450256 0.0085311 2.030668 0.0016998
Residual 472 168.545708 0.9914689 NA NA
Total 474 169.995965 1.0000000 NA NA

pairwise

healthy weight v overweight

What I’ve found is obesity is significantly different from both healthy and overweight. Overweight and healthy weight are not significantly different.

BMI with beta div

# adding ordination plot
ps_ord <- ordinate(ps_g, method = "NMDS", distance = "bray")
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.2487638 
## Run 1 stress 0.2500495 
## Run 2 stress 0.2506223 
## Run 3 stress 0.2544275 
## Run 4 stress 0.2936727 
## Run 5 stress 0.2490637 
## ... Procrustes: rmse 0.02405374  max resid 0.3977212 
## Run 6 stress 0.2497193 
## Run 7 stress 0.249183 
## ... Procrustes: rmse 0.02611945  max resid 0.4501689 
## Run 8 stress 0.256932 
## Run 9 stress 0.2511705 
## Run 10 stress 0.2488965 
## ... Procrustes: rmse 0.01979155  max resid 0.3074272 
## Run 11 stress 0.2500646 
## Run 12 stress 0.25015 
## Run 13 stress 0.249494 
## Run 14 stress 0.2485425 
## ... New best solution
## ... Procrustes: rmse 0.01830247  max resid 0.2877571 
## Run 15 stress 0.2495614 
## Run 16 stress 0.2492315 
## Run 17 stress 0.250205 
## Run 18 stress 0.269858 
## Run 19 stress 0.2498718 
## Run 20 stress 0.2496044 
## *** Best solution was not repeated -- monoMDS stopping criteria:
##     15: no. of iterations >= maxit
##      5: stress ratio > sratmax
ord_plot <- plot_ordination(ps_g, ps_ord,  color = "age") +
    geom_point(size = 2) +
  #coord_fixed(xlim = c(-1.8, 1.8), ylim = c(-1.8,1.8)) +
  ggtitle(paste("PCoA, weighted UNIFRAC")) + facet_wrap(~BMI_group)
  
plot(ord_plot)

BMI continuous variable with alpha div

# testing Observed
ggplot(meta_ps, aes(x = HWBMI.x, y = Observed)) + geom_point() + stat_smooth(method = "lm", col = "red")
## `geom_smooth()` using formula = 'y ~ x'

res <- cor.test(meta_ps$HWBMI.x, meta_ps$Observed, 
                    method = "spearman") #pearson is for normal, spearman is for non-normal
## Warning in cor.test.default(meta_ps$HWBMI.x, meta_ps$Observed, method =
## "spearman"): Cannot compute exact p-value with ties
res$p.value
## [1] 0.242052
# testing Shannon
ggplot(meta_ps, aes(x = HWBMI.x, y = Shannon)) + geom_point() + stat_smooth(method = "lm", col = "red")
## `geom_smooth()` using formula = 'y ~ x'

res <- cor.test(meta_ps$HWBMI.x, meta_ps$Shannon, 
                    method = "spearman") #pearson is for normal, spearman is for non-normal
## Warning in cor.test.default(meta_ps$HWBMI.x, meta_ps$Shannon, method =
## "spearman"): Cannot compute exact p-value with ties
res$p.value
## [1] 0.8992064
# testing InvSimpson
ggplot(meta_ps, aes(x = HWBMI.x, y = InvSimpson)) + geom_point() + stat_smooth(method = "lm", col = "red")
## `geom_smooth()` using formula = 'y ~ x'

res <- cor.test(meta_ps$HWBMI.x, meta_ps$InvSimpson, 
                    method = "spearman") #pearson is for normal, spearman is for non-normal
## Warning in cor.test.default(meta_ps$HWBMI.x, meta_ps$InvSimpson, method =
## "spearman"): Cannot compute exact p-value with ties
res$p.value
## [1] 0.894394
# Testing pielou
ggplot(meta_ps, aes(x = HWBMI.x, y = pielou)) + geom_point() + stat_smooth(method = "lm", col = "red")
## `geom_smooth()` using formula = 'y ~ x'

res <- cor.test(meta_ps$HWBMI.x, meta_ps$pielou, 
                    method = "spearman") #pearson is for normal, spearman is for non-normal
## Warning in cor.test.default(meta_ps$HWBMI.x, meta_ps$pielou, method =
## "spearman"): Cannot compute exact p-value with ties
res$p.value
## [1] 0.6109389

BMI groups variable with alpha div

kruskal.test(meta_ps$Observed ~ meta_ps$BMI_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  meta_ps$Observed by meta_ps$BMI_group
## Kruskal-Wallis chi-squared = 0.91607, df = 2, p-value = 0.6325
kruskal.test(meta_ps$Shannon ~ meta_ps$BMI_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  meta_ps$Shannon by meta_ps$BMI_group
## Kruskal-Wallis chi-squared = 1.4866, df = 2, p-value = 0.4755
kruskal.test(meta_ps$InvSimpson ~ meta_ps$BMI_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  meta_ps$InvSimpson by meta_ps$BMI_group
## Kruskal-Wallis chi-squared = 2.3505, df = 2, p-value = 0.3087
kruskal.test(meta_ps$pielou ~ meta_ps$BMI_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  meta_ps$pielou by meta_ps$BMI_group
## Kruskal-Wallis chi-squared = 4.2825, df = 2, p-value = 0.1175

BMI alpha div values

meta_ps %>% 
  group_by(BMI_group) %>% 
  summarise(med_obs = quantile(Observed, probs = c(0.25,0.5,0.75)), med_shan = quantile(Shannon, probs = c(0.25,0.5,0.75)), med_inv = quantile(InvSimpson, probs = c(0.25,0.5,0.75)), med_pie = quantile(pielou, probs = c(0.25,0.5,0.75)))
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `summarise()` has grouped output by 'BMI_group'. You can override using the
## `.groups` argument.
BMI_group med_obs med_shan med_inv med_pie
healthy_weight 16.0 1.685743 2.979081 0.5540822
healthy_weight 22.0 2.087516 5.017265 0.6613625
healthy_weight 30.0 2.395938 7.104957 0.7437547
obesity 18.0 1.612226 3.245733 0.5621228
obesity 24.5 2.060327 4.916635 0.6391371
obesity 30.0 2.361623 7.028608 0.7407123
overweight 16.0 1.702340 3.472576 0.5767366
overweight 22.0 2.136180 5.453378 0.6860155
overweight 29.0 2.409417 7.367755 0.7596021
meta_ps %>% 
  group_by(BMI_group) %>% 
  summarise(mean_obs = mean(Observed), mean_shan = mean(Shannon), mean_inv = mean(InvSimpson), mean_pie = mean(pielou))
BMI_group mean_obs mean_shan mean_inv mean_pie
healthy_weight 24.13462 1.972215 5.457860 0.6315111
obesity 23.98305 1.952268 5.438977 0.6213352
overweight 23.32016 2.022743 5.878325 0.6548878

Top genera and BMI

ps_n_genus <- tax_glom(ps_n, "Genus")
ps_melt <- psmelt(ps_n_genus)
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
summary_genus <- ps_melt %>%
        group_by(Genus, BMI_group) %>%
        summarise(Abundance = Abundance, rank_abundance = sum(Abundance), mean_abundance = mean(Abundance), median_abundance = median(Abundance), min_abundance = min(Abundance), max_abundance = max(Abundance)) %>%
        arrange(desc(rank_abundance)) %>%
        mutate(order = row_number())
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `summarise()` has grouped output by 'Genus', 'BMI_group'. You can override
## using the `.groups` argument.
# Staphylococcus
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Staphylococcus")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BMI_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BMI_group
## Kruskal-Wallis chi-squared = 10.294, df = 2, p-value = 0.005816
pairwise.wilcox.test(ps_melt$Abundance, ps_melt$BMI_group, p.adjust.method = "fdr")
## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  ps_melt$Abundance and ps_melt$BMI_group 
## 
##            healthy_weight obesity
## obesity    0.0065         -      
## overweight 0.1901         0.0226 
## 
## P value adjustment method: fdr
# Neisseria
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Neisseria")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BMI_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BMI_group
## Kruskal-Wallis chi-squared = 2.1258, df = 2, p-value = 0.3454
# Corynebacterium
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Corynebacterium")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BMI_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BMI_group
## Kruskal-Wallis chi-squared = 9.5717, df = 2, p-value = 0.008347
pairwise.wilcox.test(ps_melt$Abundance, ps_melt$BMI_group, p.adjust.method = "fdr")
## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  ps_melt$Abundance and ps_melt$BMI_group 
## 
##            healthy_weight obesity
## obesity    0.0057         -      
## overweight 0.1469         0.0526 
## 
## P value adjustment method: fdr
# Prevotella
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Prevotella")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BMI_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BMI_group
## Kruskal-Wallis chi-squared = 4.0539, df = 2, p-value = 0.1317
# Streptococcus
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Streptococcus")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BMI_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BMI_group
## Kruskal-Wallis chi-squared = 19.441, df = 2, p-value = 0.00006003
pairwise.wilcox.test(ps_melt$Abundance, ps_melt$BMI_group, p.adjust.method = "fdr")
## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  ps_melt$Abundance and ps_melt$BMI_group 
## 
##            healthy_weight obesity
## obesity    0.00040        -      
## overweight 0.67509        0.00012
## 
## P value adjustment method: fdr
# Bacteroides
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Bacteroides")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BMI_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BMI_group
## Kruskal-Wallis chi-squared = 2.5372, df = 2, p-value = 0.2812
# Anaerococcus
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Anaerococcus")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BMI_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BMI_group
## Kruskal-Wallis chi-squared = 0.20641, df = 2, p-value = 0.9019
# Haemophilus
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Haemophilus")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BMI_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BMI_group
## Kruskal-Wallis chi-squared = 1.6184, df = 2, p-value = 0.4452
# Escherichia/Shigella
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Escherichia/Shigella")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BMI_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BMI_group
## Kruskal-Wallis chi-squared = 0.93119, df = 2, p-value = 0.6278
# Finegoldia
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Finegoldia")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BMI_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BMI_group
## Kruskal-Wallis chi-squared = 1.2924, df = 2, p-value = 0.524

plot

BMI_genera is saved to be put together with another plot for Figure 1 during HAllA analysis.

summary_genus$BMI_title <- "overweight"
summary_genus$BMI_title[summary_genus$BMI_group == "healthy_weight"] <- "healthy \nweight"
summary_genus$BMI_title[summary_genus$BMI_group == "obesity"] <- "obese"
summary_genus$BMI_title <- factor(summary_genus$BMI_title, levels = c("healthy \nweight", "overweight", "obese"))


summary_genus$BMI_group <- factor(summary_genus$BMI_group, levels = c("healthy_weight", "overweight", "obesity"))

plot <- summary_genus %>%
  filter(Genus %in% c("Staphylococcus", "Corynebacterium", "Streptococcus")) %>% 
  ggplot(., aes(x = BMI_title, y = Abundance, fill = BMI_title)) + 
  geom_beeswarm() +
  geom_boxplot(alpha = 0.7) +
  xlab("") + 
  facet_wrap(~Genus) + 
  theme_bw() +
  theme(strip.background = element_rect(colour="black", fill="white")) +
  theme(legend.position = "none")

my_comparisons <- list( c("healthy \nweight", "overweight"), c("healthy \nweight", "obese"), c("overweight", "obese"))

BMI_genera <- plot + stat_compare_means(comparisons = my_comparisons) +
  scale_y_continuous(expand = expansion(mult = c(0.05, 0.15)))
BMI_genera

summary_genus %>%
  filter(Genus %in% c("Staphylococcus", "Corynebacterium", "Streptococcus", "Neisseria", "Prevotella", "Anaerococcus", "Bacteroides", "Finegoldia", "Escherichia/Shigella", "Haemophilus")) %>% 
  ggplot(., aes(x = Genus, y = Abundance, fill = BMI_title)) + geom_boxplot() + facet_wrap(~Genus, scales = "free") + xlab("") + guides(fill=guide_legend(title="BMI"))

Top genera and BPH

ps_melt <- psmelt(ps_n_genus)
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
summary_genus <- ps_melt %>%
        group_by(Genus, BPH) %>%
        summarise(Abundance = Abundance, rank_abundance = sum(Abundance), mean_abundance = mean(Abundance), median_abundance = median(Abundance), min_abundance = min(Abundance), max_abundance = max(Abundance)) %>%
        arrange(desc(rank_abundance)) %>%
        mutate(order = row_number())
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `summarise()` has grouped output by 'Genus', 'BPH'. You can override using the
## `.groups` argument.
# Staphylococcus
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Staphylococcus")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BPH)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BPH
## Kruskal-Wallis chi-squared = 0.011221, df = 1, p-value = 0.9156
# Neisseria
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Neisseria")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BPH)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BPH
## Kruskal-Wallis chi-squared = 0.3368, df = 1, p-value = 0.5617
# Corynebacterium
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Corynebacterium")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BPH)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BPH
## Kruskal-Wallis chi-squared = 0.45521, df = 1, p-value = 0.4999
# Prevotella
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Prevotella")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BPH)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BPH
## Kruskal-Wallis chi-squared = 0.076543, df = 1, p-value = 0.782
# Streptococcus
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Streptococcus")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BPH)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BPH
## Kruskal-Wallis chi-squared = 1.1512, df = 1, p-value = 0.2833
# Bacteroides
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Bacteroides")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BPH)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BPH
## Kruskal-Wallis chi-squared = 0.59891, df = 1, p-value = 0.439
# Anaerococcus
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Anaerococcus")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BPH)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BPH
## Kruskal-Wallis chi-squared = 0.44092, df = 1, p-value = 0.5067
# Haemophilus
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Haemophilus")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BPH)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BPH
## Kruskal-Wallis chi-squared = 0.35104, df = 1, p-value = 0.5535
# Escherichia/Shigella
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Escherichia/Shigella")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BPH)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BPH
## Kruskal-Wallis chi-squared = 0.21031, df = 1, p-value = 0.6465
# Finegoldia
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Finegoldia")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$BPH)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$BPH
## Kruskal-Wallis chi-squared = 0.017165, df = 1, p-value = 0.8958

plot

summary_genus %>%
  filter(Genus %in% c("Staphylococcus", "Corynebacterium", "Streptococcus")) %>% 
  ggplot(., aes(x = Genus, y = Abundance, fill = BPH)) + geom_boxplot() + facet_wrap(~Genus, scales = "free")

summary_genus %>%
  filter(Genus %in% c("Staphylococcus", "Corynebacterium", "Streptococcus", "Neisseria", "Prevotella", "Anaerococcus", "Bacteroides", "Finegoldia", "Escherichia/Shigella", "Haemophilus")) %>% 
  ggplot(., aes(x = Genus, y = Abundance, fill = BPH)) + geom_boxplot() + facet_wrap(~Genus, scales = "free")

Top genera and age

ps_melt <- psmelt(ps_n_genus)
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
summary_genus <- ps_melt %>%
        group_by(Genus, age_group) %>%
        summarise(Abundance = Abundance, rank_abundance = sum(Abundance), mean_abundance = mean(Abundance), median_abundance = median(Abundance), min_abundance = min(Abundance), max_abundance = max(Abundance)) %>%
        arrange(desc(rank_abundance)) %>%
        mutate(order = row_number())
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `summarise()` has grouped output by 'Genus', 'age_group'. You can override
## using the `.groups` argument.
# Staphylococcus
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Staphylococcus")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$age_group
## Kruskal-Wallis chi-squared = 0.60704, df = 2, p-value = 0.7382
# Neisseria
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Neisseria")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$age_group
## Kruskal-Wallis chi-squared = 1.5524, df = 2, p-value = 0.4601
# Corynebacterium
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Corynebacterium")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$age_group
## Kruskal-Wallis chi-squared = 3.9896, df = 2, p-value = 0.136
# Prevotella
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Prevotella")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$age_group
## Kruskal-Wallis chi-squared = 0.052454, df = 2, p-value = 0.9741
# Streptococcus
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Streptococcus")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$age_group
## Kruskal-Wallis chi-squared = 4.4964, df = 2, p-value = 0.1056
# Bacteroides
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Bacteroides")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$age_group
## Kruskal-Wallis chi-squared = 6.5924, df = 2, p-value = 0.03702
pairwise.wilcox.test(ps_melt$Abundance, ps_melt$age_group, p.adjust.method = "fdr")
## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  ps_melt$Abundance and ps_melt$age_group 
## 
##      age1  age2 
## age2 0.059 -    
## age3 0.059 0.903
## 
## P value adjustment method: fdr
# Anaerococcus
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Anaerococcus")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$age_group
## Kruskal-Wallis chi-squared = 4.5813, df = 2, p-value = 0.1012
# Haemophilus
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Haemophilus")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$age_group
## Kruskal-Wallis chi-squared = 4.0821, df = 2, p-value = 0.1299
# Escherichia/Shigella
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Escherichia/Shigella")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$age_group
## Kruskal-Wallis chi-squared = 0.28786, df = 2, p-value = 0.8659
# Finegoldia
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Finegoldia")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$age_group
## Kruskal-Wallis chi-squared = 3.9387, df = 2, p-value = 0.1395

Bacteroides is significant, however once we do multiple testing corrections, it’s no longer significant.

Top phylum and Age

ps_n_phy <- tax_glom(ps_n, "Phylum")
ps_melt <- psmelt(ps_n_phy)
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
summary_phylum <- ps_melt %>%
        group_by(Phylum, age_group) %>%
        summarise(rank_abundance = sum(Abundance), mean_abundance = mean(Abundance), median_abundance = median(Abundance), min_abundance = min(Abundance), max_abundance = max(Abundance), Abundance = Abundance) %>%
        arrange(desc(rank_abundance)) %>%
        mutate(order = row_number()) %>%
        ungroup()
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `summarise()` has grouped output by 'Phylum', 'age_group'. You can override
## using the `.groups` argument.
# Firmicutes
ps_melt <- psmelt(ps_n_phy) %>% filter(Phylum == "Firmicutes")
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$age_group
## Kruskal-Wallis chi-squared = 0.12833, df = 2, p-value = 0.9379
# Proteobacteria
ps_melt <- psmelt(ps_n_phy) %>% filter(Phylum == "Proteobacteria")
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$age_group
## Kruskal-Wallis chi-squared = 0.53725, df = 2, p-value = 0.7644
# Actinobacteria
ps_melt <- psmelt(ps_n_phy) %>% filter(Phylum == "Actinobacteria")
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$age_group
## Kruskal-Wallis chi-squared = 7.0943, df = 2, p-value = 0.02881
pairwise.wilcox.test(ps_melt$Abundance, ps_melt$age_group, p.adjust.method = "fdr")
## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  ps_melt$Abundance and ps_melt$age_group 
## 
##      age1  age2 
## age2 0.346 -    
## age3 0.081 0.034
## 
## P value adjustment method: fdr
# Bacteroidetes
ps_melt <- psmelt(ps_n_phy) %>% filter(Phylum == "Bacteroidetes")
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$age_group
## Kruskal-Wallis chi-squared = 2.4174, df = 2, p-value = 0.2986
# Fusobacteria
ps_melt <- psmelt(ps_n_phy) %>% filter(Phylum == "Cyanobacteria")
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$age_group)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$age_group
## Kruskal-Wallis chi-squared = 1.1149, df = 2, p-value = 0.5727

plot

summary_phylum %>%
  filter(Phylum == "Actinobacteria") %>% 
  ggplot(., aes(x = Phylum, y = Abundance, fill = age_group)) + geom_boxplot()

summary_phylum %>%
  filter(Phylum %in% c("Firmicutes", "Proteobacteria", "Actinobacteria", "Bacteroidetes", "Cyanobacteria")) %>% 
  ggplot(., aes(x = Phylum, y = Abundance, fill = age_group)) + geom_boxplot() + facet_wrap(~Phylum, scales = "free") + scale_fill_discrete(name = "Age range", labels = c("65-70", "71-75", "76-90"))

Cancer status alpha div

kruskal.test(meta_ps$Observed ~ meta_ps$Cancer_Type)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  meta_ps$Observed by meta_ps$Cancer_Type
## Kruskal-Wallis chi-squared = 0.36547, df = 2, p-value = 0.833
kruskal.test(meta_ps$Shannon ~ meta_ps$Cancer_Type)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  meta_ps$Shannon by meta_ps$Cancer_Type
## Kruskal-Wallis chi-squared = 0.48699, df = 2, p-value = 0.7839
kruskal.test(meta_ps$InvSimpson ~ meta_ps$Cancer_Type)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  meta_ps$InvSimpson by meta_ps$Cancer_Type
## Kruskal-Wallis chi-squared = 1.1955, df = 2, p-value = 0.5501
kruskal.test(meta_ps$pielou ~ meta_ps$Cancer_Type)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  meta_ps$pielou by meta_ps$Cancer_Type
## Kruskal-Wallis chi-squared = 2.2425, df = 2, p-value = 0.3259

No cancer associations when it comes to alpha diversity.

Race, Diabetes, Prostatitis, BPH

alpha div for table

# Prostatitis
meta_ps %>% 
  group_by(Prostatitis) %>% 
  summarise(mean_obs = mean(Observed), mean_shan = mean(Shannon), mean_inv =
              mean(InvSimpson), mean_pie = mean(pielou), med_obs = quantile(Observed, probs = c(0.25,0.5,0.75)), med_shan = quantile(Shannon, probs = c(0.25,0.5,0.75)), med_inv = quantile(InvSimpson, probs = c(0.25,0.5,0.75)), med_pie = quantile(pielou, probs = c(0.25,0.5,0.75)))
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `summarise()` has grouped output by 'Prostatitis'. You can override using the
## `.groups` argument.
Prostatitis mean_obs mean_shan mean_inv mean_pie med_obs med_shan med_inv med_pie
No 23.56757 1.980614 5.609384 0.6359285 17 1.700752 3.321660 0.5578825
No 23.56757 1.980614 5.609384 0.6359285 23 2.093647 5.088242 0.6629334
No 23.56757 1.980614 5.609384 0.6359285 30 2.396817 7.204845 0.7531488
Yes 24.00000 2.041951 5.915818 0.6608361 16 1.629969 3.363575 0.5840326
Yes 24.00000 2.041951 5.915818 0.6608361 22 2.136180 5.455359 0.6862615
Yes 24.00000 2.041951 5.915818 0.6608361 31 2.396599 7.475148 0.7480798
# Diabetes
meta_ps %>% 
  group_by(Diabetes) %>% 
  summarise(mean_obs = mean(Observed), mean_shan = mean(Shannon), mean_inv =
              mean(InvSimpson), mean_pie = mean(pielou), med_obs = quantile(Observed, probs = c(0.25,0.5,0.75)), med_shan = quantile(Shannon, probs = c(0.25,0.5,0.75)), med_inv = quantile(InvSimpson, probs = c(0.25,0.5,0.75)), med_pie = quantile(pielou, probs = c(0.25,0.5,0.75)))
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `summarise()` has grouped output by 'Diabetes'. You can override using the
## `.groups` argument.
Diabetes mean_obs mean_shan mean_inv mean_pie med_obs med_shan med_inv med_pie
24.87500 2.178119 6.420298 0.6975589 16 1.877506 3.961239 0.6083842
24.87500 2.178119 6.420298 0.6975589 26 2.285719 6.116462 0.7308268
24.87500 2.178119 6.420298 0.6975589 31 2.559207 8.882585 0.7907660
0. No 23.28962 1.964015 5.541601 0.6335592 16 1.647245 3.245733 0.5617487
0. No 23.28962 1.964015 5.541601 0.6335592 22 2.078796 5.050811 0.6643547
0. No 23.28962 1.964015 5.541601 0.6335592 30 2.377931 7.109470 0.7476157
1. Yes 24.94203 2.047501 5.965145 0.6506712 18 1.768684 3.462120 0.5821625
1. Yes 24.94203 2.047501 5.965145 0.6506712 24 2.176892 5.243518 0.6628698
1. Yes 24.94203 2.047501 5.965145 0.6506712 30 2.391890 7.337760 0.7450489
# BPH
meta_ps %>% 
  group_by(BPH) %>% 
  summarise(mean_obs = mean(Observed), mean_shan = mean(Shannon), mean_inv =
              mean(InvSimpson), mean_pie = mean(pielou), med_obs = quantile(Observed, probs = c(0.25,0.5,0.75)), med_shan = quantile(Shannon, probs = c(0.25,0.5,0.75)), med_inv = quantile(InvSimpson, probs = c(0.25,0.5,0.75)), med_pie = quantile(pielou, probs = c(0.25,0.5,0.75)))
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `summarise()` has grouped output by 'BPH'. You can override using the `.groups`
## argument.
BPH mean_obs mean_shan mean_inv mean_pie med_obs med_shan med_inv med_pie
No 23.46559 1.952726 5.531237 0.6277404 17.0 1.668624 3.134892 0.5518958
No 23.46559 1.952726 5.531237 0.6277404 23.0 2.027607 4.780475 0.6420510
No 23.46559 1.952726 5.531237 0.6277404 29.5 2.329260 6.876756 0.7508313
Yes 23.87719 2.039073 5.835163 0.6562695 16.0 1.746549 3.558166 0.6080032
Yes 23.87719 2.039073 5.835163 0.6562695 22.5 2.179770 5.775627 0.6852489
Yes 23.87719 2.039073 5.835163 0.6562695 30.0 2.424821 7.395575 0.7533930
# Race
meta_ps %>% 
  group_by(Race) %>% 
  summarise(mean_obs = mean(Observed), mean_shan = mean(Shannon), mean_inv =
              mean(InvSimpson), mean_pie = mean(pielou), med_obs = quantile(Observed, probs = c(0.25,0.5,0.75)), med_shan = quantile(Shannon, probs = c(0.25,0.5,0.75)), med_inv = quantile(InvSimpson, probs = c(0.25,0.5,0.75)), med_pie = quantile(pielou, probs = c(0.25,0.5,0.75)))
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `summarise()` has grouped output by 'Race'. You can override using the
## `.groups` argument.
Race mean_obs mean_shan mean_inv mean_pie med_obs med_shan med_inv med_pie
1. White 23.63333 1.994355 5.696240 0.6421864 16 1.690993 3.322045 0.5714168
1. White 23.63333 1.994355 5.696240 0.6421864 23 2.106661 5.132279 0.6670039
1. White 23.63333 1.994355 5.696240 0.6421864 30 2.393067 7.170256 0.7500558
2. Other 23.89091 1.992778 5.531125 0.6356917 17 1.694556 3.234978 0.5542984
2. Other 23.89091 1.992778 5.531125 0.6356917 26 2.178873 5.023949 0.6834292
2. Other 23.89091 1.992778 5.531125 0.6356917 30 2.418197 7.822592 0.7618187
# Cancer
meta_ps %>% 
  group_by(Cancer_Type) %>% 
  summarise(mean_obs = mean(Observed), mean_shan = mean(Shannon), mean_inv =
              mean(InvSimpson), mean_pie = mean(pielou), med_obs = quantile(Observed, probs = c(0.25,0.5,0.75)), med_shan = quantile(Shannon, probs = c(0.25,0.5,0.75)), med_inv = quantile(InvSimpson, probs = c(0.25,0.5,0.75)), med_pie = quantile(pielou, probs = c(0.25,0.5,0.75)))
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `summarise()` has grouped output by 'Cancer_Type'. You can override using the
## `.groups` argument.
Cancer_Type mean_obs mean_shan mean_inv mean_pie med_obs med_shan med_inv med_pie
None 23.72274 1.983064 5.521277 0.6362878 17 1.688364 3.163341 0.5647222
None 23.72274 1.983064 5.521277 0.6362878 23 2.097090 5.073466 0.6628698
None 23.72274 1.983064 5.521277 0.6362878 30 2.390805 7.196215 0.7430824
Other 23.00000 1.989983 5.741079 0.6451459 17 1.741696 3.465732 0.5802251
Other 23.00000 1.989983 5.741079 0.6451459 23 2.090204 5.195756 0.6742238
Other 23.00000 1.989983 5.741079 0.6451459 28 2.409326 7.113704 0.7709955
Prostate 24.27692 2.054767 6.359183 0.6617687 15 1.645795 3.447023 0.5840326
Prostate 24.27692 2.054767 6.359183 0.6617687 23 2.165680 5.764774 0.6943543
Prostate 24.27692 2.054767 6.359183 0.6617687 30 2.469154 7.737528 0.7687283

observed

wilcox.test(meta_ps$Observed ~ meta_ps$Prostatitis)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$Observed by meta_ps$Prostatitis
## W = 19948, p-value = 0.6739
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(meta_ps$Observed ~ meta_ps$Race)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$Observed by meta_ps$Race
## W = 10779, p-value = 0.4206
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(meta_ps$Observed ~ meta_ps$BPH)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$Observed by meta_ps$BPH
## W = 27867, p-value = 0.8458
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(meta_ps$Observed ~ meta_ps$Cancer)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$Observed by meta_ps$Cancer
## W = 25408, p-value = 0.6215
## alternative hypothesis: true location shift is not equal to 0
test_dia <- meta_ps %>% filter(Diabetes !=  "")
wilcox.test(test_dia$Observed ~ test_dia$Diabetes)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  test_dia$Observed by test_dia$Diabetes
## W = 11637, p-value = 0.3013
## alternative hypothesis: true location shift is not equal to 0

Shannon

wilcox.test(meta_ps$Shannon ~ meta_ps$Prostatitis)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$Shannon by meta_ps$Prostatitis
## W = 18921, p-value = 0.685
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(meta_ps$Shannon ~ meta_ps$Race)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$Shannon by meta_ps$Race
## W = 11180, p-value = 0.6995
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(meta_ps$Shannon ~ meta_ps$BPH)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$Shannon by meta_ps$BPH
## W = 25013, p-value = 0.03539
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(meta_ps$Shannon ~ meta_ps$Cancer)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$Shannon by meta_ps$Cancer
## W = 23940, p-value = 0.5792
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(test_dia$Shannon ~ test_dia$Diabetes)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  test_dia$Shannon by test_dia$Diabetes
## W = 11823, p-value = 0.4016
## alternative hypothesis: true location shift is not equal to 0

InvSimpson

wilcox.test(meta_ps$InvSimpson ~ meta_ps$Prostatitis)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$InvSimpson by meta_ps$Prostatitis
## W = 18498, p-value = 0.4555
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(meta_ps$InvSimpson ~ meta_ps$Race)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$InvSimpson by meta_ps$Race
## W = 11590, p-value = 0.9671
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(meta_ps$InvSimpson ~ meta_ps$BPH)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$InvSimpson by meta_ps$BPH
## W = 25086, p-value = 0.03987
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(meta_ps$InvSimpson ~ meta_ps$Cancer)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$InvSimpson by meta_ps$Cancer
## W = 23530, p-value = 0.3968
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(test_dia$InvSimpson ~ test_dia$Diabetes)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  test_dia$InvSimpson by test_dia$Diabetes
## W = 11872, p-value = 0.4309
## alternative hypothesis: true location shift is not equal to 0

Pielou

wilcox.test(meta_ps$pielou ~ meta_ps$Prostatitis)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$pielou by meta_ps$Prostatitis
## W = 18362, p-value = 0.3921
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(meta_ps$pielou ~ meta_ps$Race)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$pielou by meta_ps$Race
## W = 11642, p-value = 0.9238
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(meta_ps$pielou ~ meta_ps$BPH)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$pielou by meta_ps$BPH
## W = 24996, p-value = 0.03441
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(meta_ps$pielou ~ meta_ps$Cancer)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  meta_ps$pielou by meta_ps$Cancer
## W = 22824, p-value = 0.1765
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(test_dia$pielou ~ test_dia$Diabetes)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  test_dia$pielou by test_dia$Diabetes
## W = 12403, p-value = 0.8155
## alternative hypothesis: true location shift is not equal to 0
ggplot(meta_ps_long, aes(x = BPH, y = value, fill = BPH)) + geom_violin() + facet_wrap(~variable, scales = "free", nrow = 1) + 
  theme(strip.text.x = element_text(size = 12))
## Warning: Removed 3 rows containing non-finite outside the scale range
## (`stat_ydensity()`).

BPH table 1

alpha_bph <- meta_ps %>%
  select(BPH, Observed, Shannon, InvSimpson, pielou) 

alpha_table <- CreateTableOne(strata = "BPH", data = alpha_bph)
print(alpha_table, nonnormal = c("Observed", "Shannon", "InvSimpson", "pielou"))
##                            Stratified by BPH
##                             No                   Yes                  p     
##   n                           247                  228                      
##   BPH = Yes (%)                 0 (0.0)            228 (100.0)        <0.001
##   Observed (median [IQR])   23.00 [17.00, 29.50] 22.50 [16.00, 30.00]  0.846
##   Shannon (median [IQR])     2.03 [1.67, 2.33]    2.18 [1.75, 2.42]    0.035
##   InvSimpson (median [IQR])  4.78 [3.13, 6.88]    5.78 [3.56, 7.40]    0.040
##   pielou (median [IQR])      0.64 [0.55, 0.75]    0.69 [0.61, 0.75]    0.034
##                            Stratified by BPH
##                             test   
##   n                                
##   BPH = Yes (%)                    
##   Observed (median [IQR])   nonnorm
##   Shannon (median [IQR])    nonnorm
##   InvSimpson (median [IQR]) nonnorm
##   pielou (median [IQR])     nonnorm

LUTS-microbiome relationships

Sequencing reads by LUTS

fauceted_plot_sample_sums <- ggplot(sample_data(ps), aes(as.factor(LUTS), SampleSums, color = LUTS)) + geom_boxplot(fill="gray")+ theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("LUTS") + ylab("Sample Sums") + theme(legend.position = "none")

boxplot_plot_sample_sums <- ggplot(sample_data(ps), aes(SampleSums, color=as.factor(LUTS), fill = LUTS)) +
  geom_histogram(fill="gray", alpha=0.5, position="identity")  +
  facet_wrap("LUTS") +
  ggtitle('Number of Reads per sample by LUTS severity') +
  theme(plot.title = element_text(hjust = 0.5)) + xlab("Sample sums") + ylab("Count") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) 

fauceted_plot_sample_sums + boxplot_plot_sample_sums
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

rm(fauceted_plot_sample_sums, boxplot_plot_sample_sums)
no/mild vs mod/severe
fauceted_plot_sample_sums <- ggplot(sample_data(ps), aes(as.factor(LUTS_cat), SampleSums)) + geom_boxplot(fill="gray")+ theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("LUTS") + ylab("Reads") + theme(legend.position = "none") + theme_bw()

boxplot_plot_sample_sums <- ggplot(sample_data(ps), aes(SampleSums)) +
  geom_histogram(alpha=0.5, position="identity")  +
  facet_wrap("LUTS_cat") +
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5)) + xlab("Number of Reads") + ylab("Count") + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none") 

fauceted_plot_sample_sums + boxplot_plot_sample_sums
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

rm(fauceted_plot_sample_sums, boxplot_plot_sample_sums)

LUTS

alpha diversity by cat

kruskal.test(Observed ~ LUTS_cat, data = meta_ps) 
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Observed by LUTS_cat
## Kruskal-Wallis chi-squared = 1.2832, df = 1, p-value = 0.2573
kruskal.test(Shannon ~ LUTS_cat, data = meta_ps)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Shannon by LUTS_cat
## Kruskal-Wallis chi-squared = 0.008879, df = 1, p-value = 0.9249
kruskal.test(InvSimpson ~ LUTS_cat, data = meta_ps)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  InvSimpson by LUTS_cat
## Kruskal-Wallis chi-squared = 0.00049333, df = 1, p-value = 0.9823
kruskal.test(pielou ~ LUTS_cat, data = meta_ps)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  pielou by LUTS_cat
## Kruskal-Wallis chi-squared = 0.55816, df = 1, p-value = 0.455

adjusted

obs.model <- lm(Observed ~ LUTS_cat + BPH * HWBMI.x +age + Prostatitis, data = meta_ps)
summary(obs.model)
## 
## Call:
## lm(formula = Observed ~ LUTS_cat + BPH * HWBMI.x + age + Prostatitis, 
##     data = meta_ps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.322  -7.049  -0.974   5.936  45.972 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             26.13039    7.81816   3.342 0.000898 ***
## LUTS_catmoderate/severe  0.83803    0.96358   0.870 0.384910    
## BPHYes                  -5.09942    6.89669  -0.739 0.460034    
## HWBMI.x                 -0.03042    0.16596  -0.183 0.854663    
## age                     -0.02919    0.08365  -0.349 0.727289    
## ProstatitisYes           0.24656    1.20096   0.205 0.837425    
## BPHYes:HWBMI.x           0.19051    0.24422   0.780 0.435756    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.08 on 468 degrees of freedom
## Multiple R-squared:  0.0041, Adjusted R-squared:  -0.008668 
## F-statistic: 0.3211 on 6 and 468 DF,  p-value: 0.9259
shan.model <- lm(Shannon ~ LUTS_cat + BPH * HWBMI.x +age + Prostatitis, data = meta_ps)
summary(shan.model)
## 
## Call:
## lm(formula = Shannon ~ LUTS_cat + BPH * HWBMI.x + age + Prostatitis, 
##     data = meta_ps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0136 -0.2884  0.1219  0.3751  1.7571 
## 
## Coefficients:
##                          Estimate Std. Error t value      Pr(>|t|)    
## (Intercept)              2.810774   0.469968   5.981 0.00000000441 ***
## LUTS_catmoderate/severe -0.018455   0.057923  -0.319        0.7502    
## BPHYes                  -0.741073   0.414577  -1.788        0.0745 .  
## HWBMI.x                 -0.014618   0.009977  -1.465        0.1435    
## age                     -0.006134   0.005029  -1.220        0.2231    
## ProstatitisYes           0.030925   0.072193   0.428        0.6686    
## BPHYes:HWBMI.x           0.029769   0.014681   2.028        0.0432 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6057 on 468 degrees of freedom
## Multiple R-squared:  0.01863,    Adjusted R-squared:  0.006052 
## F-statistic: 1.481 on 6 and 468 DF,  p-value: 0.1828
invsimp.model <- lm(InvSimpson ~ LUTS_cat + BPH * HWBMI.x +age + Prostatitis, data = meta_ps)
summary(invsimp.model)
## 
## Call:
## lm(formula = InvSimpson ~ LUTS_cat + BPH * HWBMI.x + age + Prostatitis, 
##     data = meta_ps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1607 -2.2905 -0.4567  1.4119 22.4967 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              9.78299    2.53605   3.858 0.000131 ***
## LUTS_catmoderate/severe -0.04296    0.31257  -0.137 0.890751    
## BPHYes                  -4.16899    2.23714  -1.864 0.063013 .  
## HWBMI.x                 -0.07945    0.05384  -1.476 0.140664    
## age                     -0.02796    0.02714  -1.030 0.303427    
## ProstatitisYes           0.21097    0.38957   0.542 0.588390    
## BPHYes:HWBMI.x           0.15974    0.07922   2.016 0.044334 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.269 on 468 degrees of freedom
## Multiple R-squared:  0.01451,    Adjusted R-squared:  0.00187 
## F-statistic: 1.148 on 6 and 468 DF,  p-value: 0.3333
pie.model<- lm(pielou ~ LUTS_cat  + BPH * HWBMI.x +age + Prostatitis, data = meta_ps)
summary(pie.model)
## 
## Call:
## lm(formula = pielou ~ LUTS_cat + BPH * HWBMI.x + age + Prostatitis, 
##     data = meta_ps)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.63725 -0.06964  0.02288  0.11344  0.32281 
## 
## Coefficients:
##                          Estimate Std. Error t value         Pr(>|t|)    
## (Intercept)              0.931164   0.130803   7.119 0.00000000000412 ***
## LUTS_catmoderate/severe -0.011715   0.016121  -0.727           0.4678    
## BPHYes                  -0.192421   0.115386  -1.668           0.0961 .  
## HWBMI.x                 -0.004965   0.002777  -1.788           0.0744 .  
## age                     -0.002230   0.001400  -1.593           0.1117    
## ProstatitisYes           0.016488   0.020093   0.821           0.4123    
## BPHYes:HWBMI.x           0.007929   0.004086   1.941           0.0529 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1686 on 468 degrees of freedom
## Multiple R-squared:  0.02494,    Adjusted R-squared:  0.01244 
## F-statistic: 1.995 on 6 and 468 DF,  p-value: 0.06497

looking at BMI and BPH

obs.model <- lm(Observed ~ BPH * HWBMI.x, data = meta_ps)
summary(obs.model)
## 
## Call:
## lm(formula = Observed ~ BPH * HWBMI.x, data = meta_ps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.234  -6.943  -0.633   6.308  45.566 
## 
## Coefficients:
##                Estimate Std. Error t value    Pr(>|t|)    
## (Intercept)    23.92428    4.67271   5.120 0.000000446 ***
## BPHYes         -4.56316    6.81340  -0.670       0.503    
## HWBMI.x        -0.01633    0.16478  -0.099       0.921    
## BPHYes:HWBMI.x  0.17927    0.24204   0.741       0.459    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.05 on 471 degrees of freedom
## Multiple R-squared:  0.002231,   Adjusted R-squared:  -0.004124 
## F-statistic: 0.3511 on 3 and 471 DF,  p-value: 0.7884
shan.model <- lm(Shannon ~ BPH * HWBMI.x, data = meta_ps)
summary(shan.model)
## 
## Call:
## lm(formula = Shannon ~ BPH * HWBMI.x, data = meta_ps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0769 -0.2700  0.1126  0.3889  1.8057 
## 
## Coefficients:
##                 Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)     2.356621   0.281153   8.382 0.000000000000000606 ***
## BPHYes         -0.796195   0.409957  -1.942               0.0527 .  
## HWBMI.x        -0.014378   0.009914  -1.450               0.1477    
## BPHYes:HWBMI.x  0.031648   0.014563   2.173               0.0303 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.605 on 471 degrees of freedom
## Multiple R-squared:  0.01493,    Adjusted R-squared:  0.008658 
## F-statistic:  2.38 on 3 and 471 DF,  p-value: 0.06898
invsimp.model <- lm(InvSimpson ~ BPH * HWBMI.x, data = meta_ps)
summary(invsimp.model)
## 
## Call:
## lm(formula = InvSimpson ~ BPH * HWBMI.x, data = meta_ps)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.032 -2.314 -0.513  1.510 22.768 
## 
## Coefficients:
##                Estimate Std. Error t value    Pr(>|t|)    
## (Intercept)     7.71993    1.51644   5.091 0.000000516 ***
## BPHYes         -4.37685    2.21115  -1.979      0.0483 *  
## HWBMI.x        -0.07791    0.05348  -1.457      0.1458    
## BPHYes:HWBMI.x  0.16783    0.07855   2.137      0.0331 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.263 on 471 degrees of freedom
## Multiple R-squared:  0.01174,    Adjusted R-squared:  0.005442 
## F-statistic: 1.865 on 3 and 471 DF,  p-value: 0.1347
pie.model<- lm(pielou ~ BPH * HWBMI.x, data = meta_ps)
summary(pie.model)
## 
## Call:
## lm(formula = pielou ~ BPH * HWBMI.x, data = meta_ps)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.66459 -0.07155  0.02668  0.11281  0.30830 
## 
## Coefficients:
##                 Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)     0.767063   0.078414   9.782 <0.0000000000000002 ***
## BPHYes         -0.216086   0.114338  -1.890              0.0594 .  
## HWBMI.x        -0.004960   0.002765  -1.794              0.0735 .  
## BPHYes:HWBMI.x  0.008759   0.004062   2.156              0.0316 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1687 on 471 degrees of freedom
## Multiple R-squared:  0.01719,    Adjusted R-squared:  0.01093 
## F-statistic: 2.746 on 3 and 471 DF,  p-value: 0.04253

beta diversity by cat

adonis2(dist_wu_ps ~ LUTS_cat, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.1304846 0.0025509 1.209656 0.258
Residual 473 51.0221492 0.9974491 NA NA
Total 474 51.1526339 1.0000000 NA NA
adonis2(dist_u_ps ~ LUTS_cat, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.1907699 0.002012 0.9535977 0.514
Residual 473 94.6249829 0.997988 NA NA
Total 474 94.8157528 1.000000 NA NA
adonis2(dist_b_ps ~ LUTS_cat, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 1 0.4919595 0.0028939 1.37281 0.158
Residual 473 169.5040050 0.9971061 NA NA
Total 474 169.9959645 1.0000000 NA NA

beta div adjusted

by BMI and Age only

adonis2(dist_wu_ps ~ age + HWBMI.x + LUTS_cat, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 3 0.9069306 0.0177299 2.833836 0.001
Residual 471 50.2457033 0.9822701 NA NA
Total 474 51.1526339 1.0000000 NA NA
adonis2(dist_u_ps ~ age + HWBMI.x + LUTS_cat, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 3 1.140161 0.012025 1.910907 0.001
Residual 471 93.675592 0.987975 NA NA
Total 474 94.815753 1.000000 NA NA
adonis2(dist_b_ps ~ age + HWBMI.x + LUTS_cat, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 3 2.981539 0.0175389 2.802762 0.001
Residual 471 167.014425 0.9824611 NA NA
Total 474 169.995965 1.0000000 NA NA

by age/BPH/prostatitis

adonis2(dist_wu_ps ~ HWBMI.x + age + BPH + Prostatitis + LUTS_cat, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 5 1.100456 0.0215132 2.062304 0.002
Residual 469 50.052177 0.9784868 NA NA
Total 474 51.152634 1.0000000 NA NA
adonis2(dist_u_ps ~ HWBMI.x + age + BPH + Prostatitis + LUTS_cat, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 5 1.478984 0.0155985 1.486324 0.001
Residual 469 93.336769 0.9844015 NA NA
Total 474 94.815753 1.0000000 NA NA
adonis2(dist_b_ps ~ HWBMI.x + age + BPH + Prostatitis + LUTS_cat, data = meta_ps)
Df SumOfSqs R2 F Pr(>F)
Model 5 3.707805 0.0218111 2.091503 0.001
Residual 469 166.288160 0.9781889 NA NA
Total 474 169.995965 1.0000000 NA NA

Top Phyla vs LUTS

ps_n <- transform_sample_counts(ps,function(x) 100* x/sum(x))
ps_n_phy <- tax_glom(ps_n, "Phylum")
ps_melt <- psmelt(ps_n_phy)
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
summary_phylum <- ps_melt %>%
        group_by(Phylum, LUTS_cat) %>%
        summarise(rank_abundance = sum(Abundance), mean_abundance = mean(Abundance), median_abundance = median(Abundance), min_abundance = min(Abundance), max_abundance = max(Abundance)) %>%
        arrange(desc(rank_abundance)) %>%
        mutate(order = row_number()) %>%
        ungroup()
## `summarise()` has grouped output by 'Phylum'. You can override using the
## `.groups` argument.
# Firmicutes
ps_melt <- psmelt(ps_n_phy) %>% filter(Phylum == "Firmicutes")
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$LUTS_cat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$LUTS_cat
## Kruskal-Wallis chi-squared = 0.69319, df = 1, p-value = 0.4051
# Proteobacteria
ps_melt <- psmelt(ps_n_phy) %>% filter(Phylum == "Proteobacteria")
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$LUTS_cat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$LUTS_cat
## Kruskal-Wallis chi-squared = 0.000013704, df = 1, p-value = 0.997
# Actinobacteria
ps_melt <- psmelt(ps_n_phy) %>% filter(Phylum == "Actinobacteria")
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$LUTS_cat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$LUTS_cat
## Kruskal-Wallis chi-squared = 1.6718, df = 1, p-value = 0.196
# Bacteroidetes
ps_melt <- psmelt(ps_n_phy) %>% filter(Phylum == "Bacteroidetes")
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$LUTS_cat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$LUTS_cat
## Kruskal-Wallis chi-squared = 0.046117, df = 1, p-value = 0.83
# Fusobacteria
ps_melt <- psmelt(ps_n_phy) %>% filter(Phylum == "Fusobacteria")
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$LUTS_cat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$LUTS_cat
## Kruskal-Wallis chi-squared = 0.47191, df = 1, p-value = 0.4921

Top Phyla No/Mild

ps_melt <- psmelt(ps_n_phy)
## Warning in psmelt(ps_n_phy): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
summary_phylum_nomild <- ps_melt %>% 
  filter(LUTS_cat == "no/mild") %>% 
        group_by(Phylum) %>%
        summarise(rank_abundance = sum(Abundance), mean_abundance = mean(Abundance), median_abundance = median(Abundance), min_abundance = min(Abundance), max_abundance = max(Abundance)) %>%
        arrange(desc(rank_abundance)) %>%
        mutate(order = row_number()) %>%
        ungroup()

Top Phyla Moderate/Severe

summary_phylum_modsevere <- ps_melt %>% 
  filter(LUTS_cat == "moderate/severe") %>% 
        group_by(Phylum) %>%
        summarise(rank_abundance = sum(Abundance), mean_abundance = mean(Abundance), median_abundance = median(Abundance), min_abundance = min(Abundance), max_abundance = max(Abundance)) %>%
        arrange(desc(rank_abundance)) %>%
        mutate(order = row_number()) %>%
        ungroup()

Top Genera vs LUTS

ps_n_genus <- tax_glom(ps_n, "Genus")
ps_melt <- psmelt(ps_n_genus)
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
summary_genus <- ps_melt %>%
        group_by(Genus, LUTS_cat) %>%
        summarise(rank_abundance = sum(Abundance), mean_abundance = mean(Abundance), median_abundance = median(Abundance), min_abundance = min(Abundance), max_abundance = max(Abundance)) %>%
        arrange(desc(rank_abundance)) %>%
        mutate(order = row_number()) %>%
        ungroup()
## `summarise()` has grouped output by 'Genus'. You can override using the
## `.groups` argument.
# Staphylococcus
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Staphylococcus")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$LUTS_cat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$LUTS_cat
## Kruskal-Wallis chi-squared = 0.7535, df = 1, p-value = 0.3854
# Neisseria
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Neisseria")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$LUTS_cat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$LUTS_cat
## Kruskal-Wallis chi-squared = 0.17884, df = 1, p-value = 0.6724
# Corynebacterium
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Corynebacterium")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$LUTS_cat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$LUTS_cat
## Kruskal-Wallis chi-squared = 0.0071621, df = 1, p-value = 0.9326
# Prevotella
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Prevotella")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$LUTS_cat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$LUTS_cat
## Kruskal-Wallis chi-squared = 0.81943, df = 1, p-value = 0.3653
# Streptococcus
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Streptococcus")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$LUTS_cat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$LUTS_cat
## Kruskal-Wallis chi-squared = 0.26292, df = 1, p-value = 0.6081
# Bacteroides
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Bacteroides")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$LUTS_cat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$LUTS_cat
## Kruskal-Wallis chi-squared = 0.90236, df = 1, p-value = 0.3422
# Anaerococcus
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Anaerococcus")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$LUTS_cat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$LUTS_cat
## Kruskal-Wallis chi-squared = 0.0010014, df = 1, p-value = 0.9748
# Haemophilus
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Haemophilus")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$LUTS_cat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$LUTS_cat
## Kruskal-Wallis chi-squared = 0.14812, df = 1, p-value = 0.7003
# Escherichia/Shigella
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Escherichia/Shigella")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$LUTS_cat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$LUTS_cat
## Kruskal-Wallis chi-squared = 0.7038, df = 1, p-value = 0.4015
# Finegoldia
ps_melt <- psmelt(ps_n_genus) %>% filter(Genus == "Finegoldia")
## Warning in psmelt(ps_n_genus): The sample variables: 
## Sample
##  have been renamed to: 
## sample_Sample
## to avoid conflicts with special phyloseq plot attribute names.
kruskal.test(ps_melt$Abundance ~ ps_melt$LUTS_cat)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ps_melt$Abundance by ps_melt$LUTS_cat
## Kruskal-Wallis chi-squared = 0.69829, df = 1, p-value = 0.4034

LUTS-Subanalysis

LUTS categories -

According to AUA (american urology assoc), scores for each symptom go from 0 –> 5; 0 = never, 5 = almost always

  1. Over the past month, how often have you had the feeling of not completely emptying your bladder after you finished urinating? PSEMPTY
  2. Over the past month, how often have you had to urinate again less than 2 hours after you finished urinating? PSAGAIN
  3. Over the past month, how often have you found that you stopped and started again several times when you urinated? PSSTOP
  4. Over the past month, how often have you found it hard to hold your urine? PSPOST
  5. Over the past month, how often have you had a weak urine stream? PSWEAK
  6. Over the past month, how often have you had to push or strain to begin urination? PSPUSH
  7. Over the past month, have you had to get up to urinate during the night? Give a score to the number of times. PSUP

Creating columns

Creating a column for irritative vs obstructive symptoms. These columns will have the summed scores from the questions that pertain to each type of symptom as described by Dr. Mark Garzotto (11/18/21)

# obstructive columns
obs_col <- c('PSEMPTY', 'PSSTOP', 'PSWEAK', 'PSPUSH')

# irritative columns
irr_col <- c('PSAGAIN', 'PSPOST', 'PSUP')

meta_ps <- meta_ps %>% mutate(PS_Obstructive = rowSums(meta_ps[,obs_col]), PS_Irritative = rowSums(meta_ps[,irr_col]))

sample_data(ps) <- meta_ps
sample_data(ps_n) <- meta_ps

clean-up

rm(list=ls()[! ls() %in% c("ps","meta_ps", "BMI_genera")])

saving new ps_object

save.image(here("data", "Final_Analyses", "ps_subcat.RData"))